We develop Bayesian neural networks (BNNs) that permit to model generic nonlinearities and time variation for (possibly large sets of) macroeconomic and financial variables. From a methodological point of view, we allow for a general specification of networks that can be applied to either dense or sparse datasets, and combines various activation functions, a possibly very large number of neurons, and stochastic volatility (SV) for the error term. From a computational point of view, we develop fast and efficient estimation algorithms for the general BNNs we introduce. From an empirical point of view, we show both with simulated data and with a set of common macro and financial applications that our BNNs can be of practical use, particularly so for observations in the tails of the cross-sectional or time series distributions of the target variables.
translated by 谷歌翻译
Neural Radiance Fields (NeRFs) are coordinate-based implicit representations of 3D scenes that use a differentiable rendering procedure to learn a representation of an environment from images. This paper extends NeRFs to handle dynamic scenes in an online fashion. We do so by introducing a particle-based parametric encoding, which allows the intermediate NeRF features -- now coupled to particles in space -- to be moved with the dynamic geometry. We backpropagate the NeRF's photometric reconstruction loss into the position of the particles in addition to the features they are associated with. The position gradients are interpreted as particle velocities and integrated into positions using a position-based dynamics (PBS) physics system. Introducing PBS into the NeRF formulation allows us to add collision constraints to the particle motion and creates future opportunities to add other movement priors into the system such as rigid and deformable body constraints. We show that by allowing the features to move in space, we incrementally adapt the NeRF to the changing scene.
translated by 谷歌翻译
Skill-based reinforcement learning (RL) has emerged as a promising strategy to leverage prior knowledge for accelerated robot learning. Skills are typically extracted from expert demonstrations and are embedded into a latent space from which they can be sampled as actions by a high-level RL agent. However, this skill space is expansive, and not all skills are relevant for a given robot state, making exploration difficult. Furthermore, the downstream RL agent is limited to learning structurally similar tasks to those used to construct the skill space. We firstly propose accelerating exploration in the skill space using state-conditioned generative models to directly bias the high-level agent towards only sampling skills relevant to a given state based on prior experience. Next, we propose a low-level residual policy for fine-grained skill adaptation enabling downstream RL agents to adapt to unseen task variations. Finally, we validate our approach across four challenging manipulation tasks that differ from those used to build the skill space, demonstrating our ability to learn across task variations while significantly accelerating exploration, outperforming prior works. Code and videos are available on our project website: https://krishanrana.github.io/reskill.
translated by 谷歌翻译
Recognizing a word shortly after it is spoken is an important requirement for automatic speech recognition (ASR) systems in real-world scenarios. As a result, a large body of work on streaming audio-only ASR models has been presented in the literature. However, streaming audio-visual automatic speech recognition (AV-ASR) has received little attention in earlier works. In this work, we propose a streaming AV-ASR system based on a hybrid connectionist temporal classification (CTC)/attention neural network architecture. The audio and the visual encoder neural networks are both based on the conformer architecture, which is made streamable using chunk-wise self-attention (CSA) and causal convolution. Streaming recognition with a decoder neural network is realized by using the triggered attention technique, which performs time-synchronous decoding with joint CTC/attention scoring. For frame-level ASR criteria, such as CTC, a synchronized response from the audio and visual encoders is critical for a joint AV decision making process. In this work, we propose a novel alignment regularization technique that promotes synchronization of the audio and visual encoder, which in turn results in better word error rates (WERs) at all SNR levels for streaming and offline AV-ASR models. The proposed AV-ASR model achieves WERs of 2.0% and 2.6% on the Lip Reading Sentences 3 (LRS3) dataset in an offline and online setup, respectively, which both present state-of-the-art results when no external training data are used.
translated by 谷歌翻译
We present a retrospective on the state of Embodied AI research. Our analysis focuses on 13 challenges presented at the Embodied AI Workshop at CVPR. These challenges are grouped into three themes: (1) visual navigation, (2) rearrangement, and (3) embodied vision-and-language. We discuss the dominant datasets within each theme, evaluation metrics for the challenges, and the performance of state-of-the-art models. We highlight commonalities between top approaches to the challenges and identify potential future directions for Embodied AI research.
translated by 谷歌翻译
我们表明,如果考虑密度感知的认知不确定性项,则有效地量化神经辐射场(NERF)中的模型不确定性。在先前的工作中调查的幼稚合奏简单地渲染了RGB图像,以量化因观察到的场景的解释而引起的模型不确定性。相比之下,我们还考虑了各个射线沿线的终止概率,以确定认知模型的不确定性,因为对训练过程中未观察到的场景部分的知识不足。我们在NERF的既定不确定性量化基准中实现了新的最先进的性能,优于需要对NERF架构和培训制度进行复杂更改的方法。我们此外表明,可以将NERF不确定性用于次要视图选择和模型改进。
translated by 谷歌翻译
许多高性能作品在分布外(OOD)检测方面使用真实或合成生成的异常数据来正式化模型置信度;但是,它们通常需要重新培训基本网络或专门的模型体系结构。我们的作品表明,嘈杂的嵌入式在OOD对象​​检测的挑战领域中使异常值(Nimgo)成为了很大的异常值。我们假设合成异常值只需要最小化分布(ID)数据的扰动变体即可训练一个歧视器以识别OOD样本 - 而无需昂贵的基本网络重新培训。为了检验我们的假设,我们通过在图像或边界盒级别上应用添加剂噪声扰动来生成一个合成的离群值。然后,对辅助功能监视多层感知器(MLP)进行训练,以使用扰动的ID样品作为代理来检测OOD特征表示。在测试过程中,我们证明辅助MLP将ID样品与最新水平的OOD样品区分开在OpenImages数据集中。广泛的额外消融提供了支持我们假设的经验证据。
translated by 谷歌翻译
众包技术依靠人群输入可能对决策至关重要的信息。这项工作检查了报告技术的混淆。我们表明,报告平台的广泛使用具有独特的安全性和隐私影响,并引入了威胁模型和相应的分类法,以概述该领域中众多攻击向量中的一些。然后,我们对有争议的现实世界报告热线的呼叫日志数据集进行了经验分析,并确定旨在阻碍平台合法性的协调混淆策略。我们提出了各种统计措施,以量化这种混淆策略在我们数据集中报告攻击的结构和语义特征方面的强度。
translated by 谷歌翻译
如今,算法在控制或影响我们生活的各个方面的许多技术系统中起着关键作用。结果,提供解释以满足用户和组织的需求,越来越多地受到法律法规,行为准则和公众的期望。但是,由于法律和法规没有规定如何满足这种期望,因此通常会留下组织来设计自己的解释性方法,不可避免地增加合规性和良好的治理成本。因此,我们提出了“通过设计的解释性”,这是一种以主动措施为特征的整体方法,包括在决策系统设计中的解释能力。本文介绍了软件工程工作流程中解释性方法的技术步骤,以实现域专家针对特定应用程序上下文提出的要求的解释能力。解释性逐设计方法的输出是一组配置,允许可重复使用的服务(称为解释助手)利用应用程序提供的日志并创建可以查询以提取相关数据点的出处痕迹,而这又可以是用于解释计划,以构建向消费者个性化的解释。遵循这些步骤,组织将能够设计其决策系统,以产生满足指定要求的解释,无论是根据法律,法规或业务需求而设计的。我们将方法应用于两个应用程序,从而部署了解释助理,展示了解释功能。最后,测量了相关的开发成本,表明构建解释的方法在开发时间方面是可以探讨的,每个解释句子可能低至两个小时。
translated by 谷歌翻译
随着自动决策解决方案越来越多地应用于日常生活的各个方面,因此为各种利益相关者(即决策者,决策者,审计师,监管机构...)产生有意义的解释能力变得至关重要。在本文中,我们提出了一种解释的分类法,该分类是作为该项目目的的整体“解释性划分”方法的一部分。该分类法的建立是为了为在组织层面设定的各种监管框架或政策所引起的广泛要求提供解释,以转化高级合规性要求或满足业务需求。分类法包括九个维度。它被用作被认为是侦探控制的解释的独立分类器,以帮助支持性自动化的合规策略。通过一系列示例证明了分类法的可机械性格式,并以轻度本体的形式提供了使用这种分类法的解释性的好处。
translated by 谷歌翻译